Problem Set 2

Forecasting/Predictive Analysis

Hyndman Book Chapter 3.7


Name: Sherry Peng Tian

Date: Oct. 7, 2019

In [1]:
sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS  10.14.6

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] compiler_3.5.1  IRdisplay_0.7.0 pbdZMQ_0.3-3    tools_3.5.1    
 [5] htmltools_0.3.6 base64enc_0.1-3 crayon_1.3.4    Rcpp_1.0.2     
 [9] uuid_0.1-2      IRkernel_1.0.2  jsonlite_1.6    digest_0.6.18  
[13] repr_0.19.2     evaluate_0.13  

Exercise 1

Finding the appropriate Box-Cox transformation depends on the algorithm using BoxCox.lambda() to find the optimal parameters.

In [2]:
library(fpp2)

#help(usnetelec)
#help(usgdp)
#help(mcopper)
#help(enplanements)
Loading required package: ggplot2
Loading required package: forecast
Warning message:
“package ‘forecast’ was built under R version 3.5.2”Loading required package: fma
Loading required package: expsmooth
In [3]:
lambda <- BoxCox.lambda(usnetelec)
autoplot(BoxCox(usnetelec, lambda))
lambda
0.516771443964645
In [4]:
lambda <- BoxCox.lambda(usgdp)
autoplot(BoxCox(usgdp, lambda))
lambda
0.366352049520934
In [5]:
lambda <- BoxCox.lambda(mcopper)
autoplot(BoxCox(mcopper, lambda))
lambda
0.191904709003829
In [6]:
lambda <- BoxCox.lambda(enplanements)
autoplot(BoxCox(enplanements, lambda))
lambda
-0.226946111237065

Exercise 2

help(cangas)

Monthly Canadian gas production, billions of cubic metres, January 1960 - February 2005.

In [7]:
autoplot(cangas)
In [8]:
lambda <- BoxCox.lambda(cangas)
autoplot(BoxCox(cangas, lambda))
lambda
0.576775938228139
In [9]:
cangas
ERROR while rich displaying an object: Error in repr_matrix_generic(obj, "\n%s%s\n", sprintf("|%%s\n|%s|\n", : formal argument "cols" matched by multiple actual arguments

Traceback:
1. FUN(X[[i]], ...)
2. tryCatch(withCallingHandlers({
 .     if (!mime %in% names(repr::mime2repr)) 
 .         stop("No repr_* for mimetype ", mime, " in repr::mime2repr")
 .     rpr <- repr::mime2repr[[mime]](obj)
 .     if (is.null(rpr)) 
 .         return(NULL)
 .     prepare_content(is.raw(rpr), rpr)
 . }, error = error_handler), error = outer_handler)
3. tryCatchList(expr, classes, parentenv, handlers)
4. tryCatchOne(expr, names, parentenv, handlers[[1L]])
5. doTryCatch(return(expr), name, parentenv, handler)
6. withCallingHandlers({
 .     if (!mime %in% names(repr::mime2repr)) 
 .         stop("No repr_* for mimetype ", mime, " in repr::mime2repr")
 .     rpr <- repr::mime2repr[[mime]](obj)
 .     if (is.null(rpr)) 
 .         return(NULL)
 .     prepare_content(is.raw(rpr), rpr)
 . }, error = error_handler)
7. repr::mime2repr[[mime]](obj)
8. repr_markdown.ts(obj)
9. repr_ts_generic(obj, repr_markdown.matrix, ...)
10. repr_func(m, ..., rows = nrow(m), cols = ncol(m))
JanFebMarAprMayJunJulAugSepOctNovDec
1960 1.4306 1.3059 1.4022 1.1699 1.1161 1.0113 0.9660 0.9773 1.0311 1.2521 1.4419 1.5637
1961 1.7450 1.5835 1.6770 1.5155 1.4051 1.2464 1.2238 1.2917 1.3541 1.5552 1.7648 2.2549
1962 2.4164 2.3484 2.3852 2.0651 2.0594 1.9263 1.9631 2.0396 2.1104 2.3427 2.4220 2.8554
1963 3.0566 2.6798 2.8101 2.5863 2.4674 2.2407 2.2662 2.2889 2.4305 2.6345 2.9716 3.2577
1964 3.4872 3.2322 3.2464 3.1104 2.8838 2.5920 2.6996 2.7960 2.7960 3.0736 3.3399 4.0565
1965 3.9007 3.4787 3.7563 3.3172 3.2549 3.0311 2.8186 3.0792 3.1217 3.3937 3.6911 4.0990
1966 4.3342 3.6911 3.9347 3.5976 3.4985 3.2010 3.1812 3.2095 3.1019 3.7704 4.1500 4.3483
1967 4.5211 3.9546 4.3087 3.9829 3.9121 3.4418 3.4588 3.6855 3.6061 4.0509 4.2888 4.8894
1968 4.9942 4.6401 4.7761 4.3795 4.1160 3.9546 3.8073 4.1217 4.0622 4.7053 5.0848 5.6457
1969 5.7194 5.2180 5.6344 5.2520 4.7251 4.2265 4.4531 4.7619 4.9404 5.4135 5.8582 6.1981
1970 6.4474 5.8412 6.4701 6.0650 5.7449 4.9942 5.1443 5.3880 5.7222 6.1216 6.6230 7.2718
1971 7.1953 6.3624 7.0735 6.4163 6.1670 5.5069 5.8837 5.9149 6.0962 6.5239 6.7958 7.6995
1972 7.8751 7.6768 7.8581 7.3596 7.1273 6.8497 6.9630 7.0735 7.3567 8.1669 8.3142 8.9686
1973 9.0224 8.1811 8.7590 8.2009 7.7250 7.3397 7.5012 7.7788 7.7562 8.5947 8.8666 9.1641
1974 9.0422 8.0621 8.8666 8.2462 7.8751 7.3284 7.4672 7.7618 7.2151 8.1159 8.2717 8.8383
1975 8.8185 7.9176 8.8751 8.1839 7.7760 7.4927 7.8808 7.5437 7.3681 8.2009 8.4785 9.0394
1976 9.0394 8.4332 8.9289 8.2406 7.9828 7.5324 7.4417 7.2774 7.0650 8.3284 8.5947 9.2094
1977 9.4275 8.4360 9.3907 8.5975 8.4672 7.6287 7.4927 7.7193 7.8751 8.4757 9.0111 9.9147
1978 9.8836 8.5493 8.9176 8.2066 7.6315 6.9885 7.3794 7.5154 7.2094 8.1301 9.141410.0790
197910.3861 9.2797 9.3695 9.0704 8.4779 7.8664 7.9503 7.7068 7.2012 8.8276 9.279710.1472
198010.4667 8.8772 9.2800 7.7422 7.6340 6.8241 6.9600 7.0926 6.8148 8.0561 8.938510.0549
1981 9.7207 8.8340 8.8871 8.0279 7.3090 7.0312 7.1871 6.9440 7.0846 8.2945 8.6364 9.8766
198210.5737 9.1005 9.1274 8.1889 7.3363 6.9345 6.8699 6.8529 6.8667 7.9197 9.0446 9.9246
1983 9.8926 8.5378 8.4436 7.6013 6.9464 6.4744 6.5638 6.5996 7.0979 7.8357 8.791710.7124
198410.5179 8.6200 8.9368 7.9969 7.6920 6.9530 7.2329 7.2987 7.3991 8.5917 9.711110.7278
198510.887810.109010.0308 8.7854 8.1684 7.4509 7.3972 7.4600 7.9639 8.958510.148111.3796
198611.0386 9.7654 9.3167 8.1464 7.8331 7.3008 7.4569 7.4211 7.7622 9.0480 9.955310.7079
198710.8455 9.309410.1020 8.9196 8.3150 7.4219 7.9237 8.6236 8.584410.356911.008112.1456
198812.659611.470311.320010.2638 9.7286 8.9138 9.8172 9.9253 9.877710.867011.595212.4078
198912.345711.502812.169511.291610.5054 9.854010.408410.372510.412211.323911.637712.9080
199012.508011.161111.587111.100310.963110.127110.551310.466810.271011.992012.135013.4248
199113.250011.341012.485611.900911.383410.810410.647410.599910.797312.440112.972213.8099
199213.270512.716313.199412.723512.550111.516612.136412.652112.392913.382014.078514.7586
199314.378513.334714.730814.261613.236313.099514.038713.915313.426514.446114.512115.2108
199415.803414.125715.317314.766314.761414.362314.872615.154814.309115.441115.713416.4601
199516.739514.792916.554715.922015.656815.005715.349915.532415.093816.267816.194516.9076
199617.146715.693116.708116.088716.411415.367016.143316.063315.519016.280416.266817.2538
199717.270615.227016.640916.636416.537615.582416.359416.352016.341916.780016.388517.3255
199817.916615.842817.434416.705716.705615.783416.791816.776716.392117.151517.012217.8951
199918.491416.220417.715217.247217.687616.241017.545117.293316.771217.566717.322118.2621
200018.530817.389218.022517.260417.861317.030917.676217.980517.207318.265118.332319.4360
200119.265117.479018.618217.886918.542317.720718.566418.418517.474518.622018.123718.7519
200219.320817.260319.287918.236618.239717.508318.096017.793017.166018.111818.134819.1301
200319.235017.245618.413517.446817.559716.848217.384417.588316.905317.792017.720019.1440
200419.243917.782318.333217.847518.093517.165717.878317.641216.906717.826817.832219.4526
200519.528416.9441

Why is a Box-Cox transformation cangas data?

From the graphs, we see that the autoplot() graphs don't change from choosing the best lambda for the Box-Cox transformation. Since the Box-Cox transformation only transfers non-normal dependent variables into a normal shape, when the original data is not seasonal or normal, the transformation couldn't help with the irrelevance.

If λ = 1, then $w_t = y_{t−1}$, so the transformed data is shifted downwards but there is no change in the shape of the time series. But for all other values of λ, the time series will change shape.

Exercise 4

In [10]:
lambda <- BoxCox.lambda(dole)
autoplot(BoxCox(dole, lambda))
In [11]:
autoplot(dole)

dole: Box-Cox transformation takes 0.3291 as $\lambda$ and provides better seasonal trend in visidualization in a normal shape.

In [12]:
lambda <- BoxCox.lambda(usdeaths)
autoplot(BoxCox(usdeaths, lambda))

usdeaths: Same as before without any Box-Cox transformation.

In [13]:
lambda <- BoxCox.lambda(bricksq)
autoplot(BoxCox(bricksq, lambda))

bricksq: Same as before without any Box-Cox transformation. Because the original dataset doesn't show a highly seasonal trend, the $(y_t^\lambda - 1)/\lambda$ transformation couldn't create a relationship.

Exercise 5

In [14]:
beer <- window(ausbeer, start = 1992)
fc <- snaive(beer)
autoplot(fc)
res <- residuals(fc)
autoplot(res)
In [15]:
checkresiduals(fc)
	Ljung-Box test

data:  Residuals from Seasonal naive method
Q* = 32.269, df = 8, p-value = 8.336e-05

Model df: 0.   Total lags used: 8

I conclude that the seasonal naïve forecast might not be the appropriate approach for the quarterly Australian beer production data because the residuals are not normally distributed and splatted with white noise. First of all, the residuals are somehow moving around 0 but it seems more lying below the x-axis. Secondly, there shows the white noise around lag-4 in the ACF chart and residual graph is not a normal distribution.

Exercise 8

In [16]:
retaildata <- readxl::read_excel("/Users/apple/Desktop/bc_f19_econ/Forecasting/data/retail.xlsx", skip = 1)
myts <- ts(retaildata[,"A3349873A"], frequency=12, start=c(1982,4))

# split two parts 
myts.train <- window(myts, end = c(2010, 12))
myts.test <- window(myts, start = 2011)

autoplot(myts) + 
    autolayer(myts.train, series = "Training") + 
    autolayer(myts.test, series = "Test")
In [17]:
# simple naïve forecasts 
fc <- snaive(myts.train)

# compare accuracy ???
accuracy(fc, myts.test)
MERMSEMAEMPEMAPEMASEACF1Theil's U
Training set 7.77297320.24576 15.95676 4.702754 8.1097771.000000 0.7385090 NA
Test set55.30000071.44309 55.78333 14.90099615.0820193.495907 0.53152391.297866
In [18]:
checkresiduals(fc)
	Ljung-Box test

data:  Residuals from Seasonal naive method
Q* = 624.45, df = 24, p-value < 2.2e-16

Model df: 0.   Total lags used: 24

e.

The residuals are normally distributed but there is lots of white noise in the ACF graph, exceeding the higher and lower boundaries.

f.

How sensitive are the accuracy measures to the training/test split?

Exercise 12

In [19]:
hsales
ERROR while rich displaying an object: Error in repr_matrix_generic(obj, "\n%s%s\n", sprintf("|%%s\n|%s|\n", : formal argument "cols" matched by multiple actual arguments

Traceback:
1. FUN(X[[i]], ...)
2. tryCatch(withCallingHandlers({
 .     if (!mime %in% names(repr::mime2repr)) 
 .         stop("No repr_* for mimetype ", mime, " in repr::mime2repr")
 .     rpr <- repr::mime2repr[[mime]](obj)
 .     if (is.null(rpr)) 
 .         return(NULL)
 .     prepare_content(is.raw(rpr), rpr)
 . }, error = error_handler), error = outer_handler)
3. tryCatchList(expr, classes, parentenv, handlers)
4. tryCatchOne(expr, names, parentenv, handlers[[1L]])
5. doTryCatch(return(expr), name, parentenv, handler)
6. withCallingHandlers({
 .     if (!mime %in% names(repr::mime2repr)) 
 .         stop("No repr_* for mimetype ", mime, " in repr::mime2repr")
 .     rpr <- repr::mime2repr[[mime]](obj)
 .     if (is.null(rpr)) 
 .         return(NULL)
 .     prepare_content(is.raw(rpr), rpr)
 . }, error = error_handler)
7. repr::mime2repr[[mime]](obj)
8. repr_markdown.ts(obj)
9. repr_ts_generic(obj, repr_markdown.matrix, ...)
10. repr_func(m, ..., rows = nrow(m), cols = ncol(m))
JanFebMarAprMayJunJulAugSepOctNovDec
1973556068636561545246423730
1974374455535850484541343024
1975293444545751515346464639
1976415355625556575958554947
1977576884817874647471635551
1978576375858077687268705350
1979535873726863646860544135
1980434444364450556150463933
1981374049444538363428292729
1982282936323634313639403933
1983444657596459515048514548
1984525863615958524853554238
1985485567606565636154525147
1986555989847566575260544849
1987535973726258555652524337
1988435568686465575954574342
1989525158606158626149514740
1990455058525050464638373429
1991304046464747434637413936
1992485556535253525651484242
1993445060665859555757565351
1994455874656555525954574540
19954747605863646463555444
In [20]:
autoplot(hsales)
In [21]:
lambda <- BoxCox.lambda(hsales)
autoplot(BoxCox(hsales, lambda))
lambda

# Not a huge change
0.1454607801013
In [22]:
hsales.train <- window(hsales, end = c(1993, 12))
hsales.test <- window(hsales, start = 1994)

autoplot(hsales) + 
    autolayer(hsales.train, series = "Train") + 
    autolayer(hsales.test, series = "Test")
In [23]:
fc1 <- meanf(hsales.train)
fc2 <- naive(hsales.train)   # should be equivalent alternative to naive 
fc3 <- rwf(hsales.train)
fc4 <- snaive(hsales.train)
fc5 <- rwf(hsales.train, drift = TRUE)
In [24]:
identical(fc2, fc3)

all.equal(fc2, fc3)
FALSE
  1. 'Component “method”: 1 string mismatch'
  2. 'Component “model”: Component “call”: target, current do not match when deparsed'
In [25]:
autoplot(hsales) + 
    autolayer(meanf(hsales.train, h = 50), 
              series = "Mean", PI = FALSE) + 
    autolayer(naive(hsales.train, h = 50), 
             series = "Naïve", PI = FALSE) + 
    autolayer(rwf(hsales.train, drift = TRUE, h = 50), 
             series = "Drift", PI = FALSE) + 
    autolayer(snaive(hsales.train, h = 50), 
             series = "Simple Naïve", PI = FALSE) + 
    ggtitle("New One-Family House in the US") + 
    xlab("Year") + ylab("Sales $") + 
    guides(color = guide_legend(title = "Forecast Train"))

From the comparison graph, it seems simple naïve works the best with prediction on train dataset against test set. Therefore, we choose snaive() and save fc4 as the final method.

After checking the residuals of simple naïve method, we found that the residual graphs show normal distribution overall; however, there are lots of white noise exceeding the higher benchmark as well as the lower one on the ACF chart. Additionally, a lot of residual points splatter around 0, but more lying below the x-axis. Overall, all of the prediction methods we learned so far might not be a proper one to predict the house sales data.

In [26]:
checkresiduals(fc4)
	Ljung-Box test

data:  Residuals from Seasonal naive method
Q* = 684.66, df = 24, p-value < 2.2e-16

Model df: 0.   Total lags used: 24

Problem Set 1

Forecasting/Predictive Analysis

Name: Sherry Peng Tian

Date: Sep. 17, 2019


Hyndman Book Chapter 2.10

Exercise 1

In [27]:
library(ggplot2)
#install.packages('fpp2')
library(fpp2)
In [28]:
?gold

Daily morning gold prices in US dollars. 1 January 1985 – 31 March 1989.

In [29]:
?woolyrnq

Quarterly production of woollen yarn in Australia: tonnes. Mar 1965 – Sep 1994.

In [30]:
?gas

Australian monthly gas production: 1956–1995.

Exercise 2

In [32]:
tute1 <- read.csv("/Users/apple/Desktop/bc_f19_econ/Forecasting/data/tute1.csv", header = TRUE)
# b. 
mytimeseries <- ts(tute1[,-1], start = 1981, frequency = 4)
# The [,-1] removes the first column which contains the quarters.
In [33]:
# c. 
autoplot(mytimeseries, facets = TRUE)
In [34]:
autoplot(mytimeseries)
In [35]:
ggplot() + 
    geom_point(data = tute1, aes(GDP, Sales)) + 
    geom_point(data = tute1, aes(GDP, Sales), colour = "red", size = 3)
In [36]:
plot(mytimeseries)

Exercise 3

In [37]:
# a. 
retaildata <- readxl::read_excel("/Users/apple/Desktop/bc_f19_econ/Forecasting/data/retail.xlsx", skip = 1)
# skip=1 is required because the Excel sheet has two header rows
In [38]:
samplets <- ts(retaildata[, "A3349873A"], 
              frequency = 12, start = c(1982, 4))
In [39]:
# b. 
myts <- ts(retaildata[, "A3349588R"], 
          frequency = 12, start = c(1982, 4))
In [40]:
autoplot(samplets)
In [41]:
ggseasonplot(myts)

Exercise 4

In [42]:
help(bicoal)
autoplot(bicoal)
In [43]:
help(chicken)
autoplot(chicken)
In [44]:
help(dole)
autoplot(dole)
In [45]:
help(usdeaths)
autoplot(usdeaths)
In [46]:
help(lynx)
autoplot(lynx)
In [49]:
#!!!! ! 
help(goog)
autoplot(goog) + 
    ggtitle("Google Stock Prices") + 
    xlab("Time") + ylab("Daily Prices $")